Moving object detection in image frames based on optical flow maps

ABSTRACT

An apparatus and method for detection of moving objects in a sequence of frames, includes circuitry, a memory, and an image-capture device. The circuitry derives a first optical flow map based on motion information of a first frame and a second frame of a plurality of frames. A second optical flow map is further derived, based on motion information of the second frame and a third frame of the plurality of frames. A first foreground region is further identified that corresponds to the moving objects across the first frame and the second frame and the identified first foreground region is warped across the first frame and the second frame and the moving objects are detected, based on combination of a plurality of pixels in the warped first foreground region and the second foreground region.

REFERENCE

None.

FIELD

Various embodiments of the disclosure relate to object detection incomputer vision. More specifically, various embodiments of thedisclosure relate to an apparatus and method for moving object detectionin image frames based on optical flow maps.

BACKGROUND

Recent advancements in the field of computer vision have led todevelopment of various techniques for moving object detection in asequence of image frames such as video content. Such techniques formoving object detection in the video content may be useful in variousapplications, for example, video-surveillance applications, activityrecognition applications, auto-focus applications, or detectingobject-of-interest in camera applications. One such image processingtechnique is image segmentation, which may refer to the partitioning ofan image into several regions based on certain rules. Although varioussegmentation methods have been known to separate foreground region(object) from the background of an image or a video, the complexity,accuracy, and computational resource requirements vary based on one ormore objectives to be achieved.

In certain scenarios, as a result of severe noise present in most imageand depth sensors, the boundaries of the foreground object regionsobtained that rely heavily on values obtained from such image and depthsensors, are often not smooth. There may be some undesired holes withinthe foreground object regions as a result of the invalid values from theimage and depth sensor. In certain scenarios, foreground detection andsegregation from background in a sequence of frames may be performedbased on an optical flow procedure. In certain scenarios, whiledetecting moving objects corresponding to the foreground region, duringan image/video capture, noise may be present in the optical flowprocedure. In such scenarios, the optical flow procedure-basedtechniques may lead to erroneous detection of objects-of-interest duringthe movement of the objects.

Further limitations and disadvantages of conventional and traditionalapproaches will become apparent to one of skill in the art, throughcomparison of described systems with some aspects of the presentdisclosure, as set forth in the remainder of the present application andwith reference to the drawings.

SUMMARY

An apparatus and method for moving object detection in a sequence offrames based on optical flow is provided substantially as shown in,and/or described in connection with, at least one of the figures, as setforth more completely in the claims.

These and other features and advantages of the present disclosure may beappreciated from a review of the following detailed description of thepresent disclosure, along with the accompanying figures in which likereference numerals refer to like parts throughout.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram that illustrates an exemplary networkenvironment for detection of moving objects in a sequence of frames, inaccordance with an embodiment of the disclosure.

FIG. 2 is a block diagram that illustrates an exemplary image processingapparatus for detection of moving objects in a sequence of frames, inaccordance with an embodiment of the disclosure.

FIG. 3 illustrates an exemplary scenario and operations for detection ofmoving objects in a sequence of frames by the image processing apparatusof FIG. 2, in accordance with an embodiment of the disclosure.

FIG. 4 depicts a flow chart that illustrates an exemplary method formoving object detection in a sequence of frames, in accordance with anembodiment of the disclosure.

DETAILED DESCRIPTION

The following described implementations may be found in the disclosedimage processing apparatus and method for detection of moving objects ina sequence of frames. Exemplary aspects of the disclosure may include animage processing apparatus that may include circuitry configured tocapture, by an image-capture device, a plurality of frames in asequence. The plurality of frames may include at least a first frame, asecond frame, and a third frame. The circuitry may be further configuredto derive a first optical flow map based on motion information of afirst frame and a second frame of a plurality of frames captured by theimage-capture device. The derivation of the first optical flow map maybe further based on a difference of a pixel position of the plurality ofpixels in the second frame and the first frame. The first optical flowmap may comprise a plurality of first motion vector values thatcorresponds to a motion of the plurality of pixels from the first frameto the second frame.

In accordance with an embodiment, the circuitry may be configured toderive a second optical flow map. The second optical flow map may alsobe derived, based on motion information of the second frame and a thirdframe of the plurality of frames. The second optical flow map maycomprise a plurality of second motion vector values that may correspondto a motion of the plurality of pixels from the second frame to thethird frame. The derivation of the second optical flow map may befurther based on a difference of a pixel position of the plurality ofpixels in the third frame and the second frame. The second optical flowmap may comprise a plurality of second motion vector values thatcorresponds to a motion of the plurality of pixels from the second frameto the third frame.

In accordance with an embodiment, the circuitry may be furtherconfigured to detect a background region in each of the plurality offrames based on a plurality of optical flow maps for the plurality offrames. The background region may be detected based on gyro informationand optical flow maps that may be derived based on displacement ofpixels in different frames of the captured plurality of frames.

In accordance with an embodiment, the circuitry may be configured toidentify a first foreground region based on the derived first opticalflow map. The identified first foreground region may correspond to atleast one moving object across the first frame and the second frame. Thecircuitry may be configured to warp the identified first foregroundregion associated with the moving object across the first frame and thesecond frame based on the derived first optical flow map. The warpingmay also be referred to as a warping process, which is used to verifythe identified first foreground region. The warping of the identifiedfirst foreground region may be further based on mapping of each pixel ofa plurality of pixels in the second frame to a corresponding pixel inthe first frame.

In accordance with an embodiment, the circuitry may be furtherconfigured to identify a second foreground region that corresponds tothe at least one moving object across the second frame and the thirdframe based on the derived second optical flow map. The circuitry may beconfigured to detect the moving object based on combination of aplurality of pixels in the warped first foreground region and aplurality of pixels in the second foreground region that corresponds tothe moving object in the third frame. The detection of at least onemoving object may be further based on an overlap of the secondforeground region for the at least one moving object over the warpedfirst foreground region for the corresponding moving object.

In accordance with an embodiment, the circuitry may be furtherconfigured to determine a final foreground region based on a combinationof the plurality of pixels in the warped first foreground region and thecorresponding plurality of pixels in the second foreground region. Thecircuitry may be configured to apply morphological operations to thefinal foreground region that corresponds to the detected at least onemoving object in the plurality of frames. In accordance with anembodiment, the morphological operations comprise morphological dilationand morphological erosion. The circuitry may be further configured tolabel a foreground region corresponding to the detected at least oneobject in the plurality of frames.

FIG. 1 is a block diagram that illustrates an exemplary networkenvironment for moving object detection in a sequence of frames, inaccordance with an embodiment of the disclosure. With reference to FIG.1, there is shown a network environment 100. The network environment 100may include an image processing apparatus 102 communicatively coupled toan image-capture device 102A and a server 104, via a communicationnetwork 106. In some embodiments, the image processing apparatus 102 maybe utilized to capture a plurality of frames of a scene (for example, ascene 108) in a field-of-view (FOV) of the image-capture device 102A.Such scene may include a plurality of objects (for example, theplurality of objects 110) that may be in motion in successive frames ofthe captured plurality of frames (in sequence) of the scene (forexample, the scene 108).

The image processing apparatus 102 may comprise suitable circuitry, andinterfaces that may be configured to segregate the foreground region andbackground region from the plurality of frames based on an optical flowmap technique. The image processing apparatus 102 may be configured todynamically segment a plurality of objects (for example, the pluralityof objects 110) as a foreground region, which may be detected in theplurality of frames in real time or near-real time. Examples of theimage processing apparatus 102 may include, but are not limited to,action cams, autofocus cams, digital cameras, camcorders, camera phones,dash cams, closed circuit television (CCTV) cams, (Internet Protocol) IPcams, reflex cams, traffic cams, projectors, web-cams, computerworkstations, mainframe computers, handheld computers, cellular/mobilephones, smart appliances, video players, DVD writer/players, smarttelevisions, a head-mounted device (HMD), and an augmented reality baseddevice.

The image-capture device 102A may comprise suitable circuitry, andinterfaces that may be configured to capture a plurality of frames in asequence from a scene in a field-of-view (FOV) of the image-capturedevice 102A. The image-capture device 102A may further include aviewfinder that may be configured to compose and focus on a scene thatmay include a plurality of objects (for example, the scene 108 thatincludes the plurality of objects 110). The image-capture device 102Amay be configured to store the captured plurality of frames in a localbuffer, a memory, and the server 104. Examples of the image-capturedevice 102Ak may include, but are not limited to, action cams, autofocuscams, digital cameras, camcorders, camera phones, dash cams, closedcircuit television (CCTV) cams, (Internet Protocol) IP cams, reflexcams, traffic cams, projectors, web-cams, computer workstations,mainframe computers, handheld computers, cellular/mobile phones, smartappliances, video players, DVD writer/players, smart televisions, ahead-mounted device (HMD), and an augmented reality based device.

The server 104 may comprise suitable logic, circuitry, and interfacesthat may be configured to store data and communicate with the imageprocessing apparatus 102. The server 104 may further include a set ofdatabases to store a set of video feeds (of sets of frames) of differentscenes. In one embodiment, the computational resources required todetect different moving objects in a scene may be shared between theimage processing apparatus 102 and the server 104. In anotherembodiment, the computational resources of the server 104 may be onlyutilized to detect different moving objects in a scene (for example, thescene 108). Examples of the server 104 may include, but are not limitedto, a web server, a database server, a file server, an applicationserver, and a cloud server.

The communication network 106 may include a medium through which theimage processing apparatus 102 may communicate with the server 104.Examples of the communication network 106 may include, but are notlimited to, the Internet, a cloud network, a Long Term Evolution (LTE)network, a Wireless Local Area Network (WLAN), a Local Area Network(LAN), a telephone line (POTS), and a Metropolitan Area Network (MAN).Various devices in the network environment 100 may be configured toconnect to the communication network 106, in accordance with variouswired and wireless communication protocols. Examples of such wired andwireless communication protocols may include, but are not limited to, atleast one of a Transmission Control Protocol and Internet Protocol(TCP/IP), User Datagram Protocol (UDP), Hypertext Transfer Protocol(HTTP), File Transfer Protocol (FTP), ZigBee, EDGE, IEEE 802.11, lightfidelity (Li-Fi), 802.16, IEEE 802.11s, IEEE 802.11g, multi-hopcommunication, wireless access point (AP), device to devicecommunication, cellular communication protocols, or Bluetooth (BT)communication protocols.

In operation, a plurality of frames may be obtained in sequence by theimage processing apparatus 102. The plurality of frames may refer to asequence of images that includes a scene (for example, the scene 108)having a plurality of objects (for example, the plurality of objects110) in motion. In accordance with an embodiment, such plurality offrames may be captured by the image-capture device 102A, communicativelycoupled to the image processing apparatus 102, in real time or near-realtime. The image-capture device 102A may be configured to capture theplurality of frames of a scene (for example, the scene 108) within theFOV of the image-capture device 102A. In accordance with anotherembodiment, the plurality of frames may be obtained from a database ofimages or videos stored on the server 104, via the communication network106. In accordance with an embodiment, the image processing apparatus102 may be a camera with integrated image sensor, such as theimage-capture device 102A.

Each frame in the plurality of frames may include a foreground regionand a background region. The foreground region may include differentobjects that may be engaged in motion in the plurality of frames and thebackground region may include all the remnant portion in the pluralityof frames apart from the foreground region. In some embodiments, thebackground region in each frame of the plurality of frames may be astatic background region with the foreground region associated withchange in position of specific set of pixel values in successive framesof the plurality of frames. In other embodiments, the background regionin some of the frames of the plurality of frames may be a semi-staticbackground region, where a portion of the background region may exhibitmotion. The plurality of images may include the plurality of movingobjects composed of at least a first portion engaged in motion in thescene (for example, the scene 108) and a second portion occluded bydifferent objects in the foreground region of the plurality of images.In accordance with an embodiment, the image-capture device 102A may bestatic or in motion.

The detection of the plurality of objects that may be engaged in motionin the plurality of frames may further require at least three frames fora verified detection of the plurality of objects in the plurality offrames. Thus, the captured plurality of frames may include at least afirst frame, a second frame, and a third frame in a sequential order,for example, each frame may be captured, for example, in a minute longsegment of a “30 frames per second (FPS)” video feed.

The image processing apparatus 102 may be configured to derive a firstoptical flow map based on motion information in the first frame and thesecond frame of the captured plurality of frames. The first optical flowmap may be generated based on a difference of pixel locations of theplurality of pixels in the second frame and the first frame. An opticalflow map represents distribution of apparent velocities of objects in animage. The image processing apparatus 102 may be further configured toutilize the derived first optical flow map to compute a plurality offirst motion vector values. The plurality of first motion vector valuesmay correspond to a relative movement of each of the plurality of pixelsfrom the first frame to the second frame.

The image processing apparatus 102 may be configured to derive a secondoptical flow map based on motion information of the second frame and thethird frame of the plurality of frames captured by the image-capturedevice 102A. The second optical flow map may be generated based on adifference of pixel location of the plurality of pixels in the secondframe and the third frame. The image processing apparatus 102 may befurther configured to utilize the derived second optical flow map tocompute a plurality of second motion vector values. The plurality ofsecond motion vector values may correspond to a relative movement ofeach of the plurality of pixels from the second image frame to the thirdframe. The computation of the plurality of first motion vector valuesand the plurality of second motion vector values is explained, forexample, in detail in FIG. 2.

The image processing apparatus 102 may be configured to identify a firstforeground region across the first frame and the second frame based onthe derived first optical flow map. The first foreground region mayinclude at least one moving object in the plurality of frames. In someembodiments, a plurality of moving objects may be in the firstforeground region. Hereinafter, the plurality of moving objects or theat least one moving object may be interchangeably referred to as movingobjects. The identification of the first foreground region (or themoving objects) across the first frame and the second frame may be done,based on the derived first optical flow map. The identification of thefirst foreground region in the plurality of frames may be done furtherbased on detection of a background region in the plurality of frames.Thus, the image processing apparatus 102 may be further configured todetect a background region in each of the plurality of frames based on aplurality of sensor-based optical flow maps for the plurality of frames.

In some embodiments, the detected background region in the plurality offrames may be extracted or subtracted from the plurality of frames toidentify the first foreground region. The background region may bedetected further based on gyro information and the optical flow mapsderived based on displacement of pixels in different frames of thecaptured plurality of frames. The image processing apparatus 102 may befurther configured to identify a second foreground region across thesecond frame and the third frame based on the derived second opticalflow map. The identified second foreground region may include movingobjects in the second frame and the third frame of a scene.

For example, a set of frames (I) of the scene 108 may include a firstframe (I₁), a second frame (I₂) and a third frame (I₃) arranged in asequential order. A displacement of a first region of “128 by 128pixels” between the second frame (I₂) and the first frame (I₁) may bedetected as a first foreground region (F₁) and a further displacement ofa second region of “320 by 240 pixels” between the second frame (I₂) andthe third frame (I₃) may be detected subsequently as a second foregroundregion (F₂).

The image processing apparatus 102 may be further configured to warp theidentified first foreground region associated with the moving objectsacross the first frame and the second frame based on the derived firstoptical flow map. The warping of the identified first foreground regionmay be further based on a map of each pixel of a plurality of pixels inthe second frame to a pixel in the first frame. The image processingapparatus 102 may be further configured to detect moving objects basedon a combination of a plurality of pixels in the warped first foregroundregion (i.e., the warped image of the first foreground region) and aplurality of pixels in the identified second foreground region that maycorrespond to the moving objects in the third frame. The detection of atleast one moving object may be further based on an overlap of the secondforeground region for the at least one moving object over the warpedfirst foreground region for the corresponding at least moving object. Inaccordance with an embodiment, the image processing apparatus 102 may beconfigured to utilize the detected moving objects from the plurality offrames to execute different object-based operations, for example,auto-focus on the detected moving objects or a modification of visualparameters associated with the detected moving objects.

It may be noted that the moving object detection may be further utilizedto obtain further information associated with the detected objects inthe captured plurality of frames. Such information may be furtherutilized in different applications, for example, human-computerinteractions, robotics (e.g., service robots), consumer electronics(e.g., smart-phones), security (e.g., recognition, tracking), retrieval(e.g., search engines, photo management), and transportation (e.g.,autonomous and assisted driving), without a deviation from scope of thedisclosure.

The image processing apparatus 102 may detect different moving objects(that may occlude or deform) for different applications that may furtherpose different requirements. The detection may be done within an optimalprocessing time, a robustness to occlusion, an invariance to rotation, arobust detection under pose change, and the like. In some cases, thedetection of different objects may be further utilized to detect atleast one of multiple types of objects that span different objectclasses (such as, humans, animals, vehicles, and the like) and a singletype of object from different views (e.g., side and frontal view ofvehicles).

It may be further noted that the plurality of objects (for example, theplurality of objects 110) may be detected and tracked invariant of ascale, a position, an occlusion, an illumination, and an orientation ofdifferent objects with respect to a state of the image-capture device102A. The detailed operation of the image processing apparatus 102 maybe further described in detail, for example, in FIG. 2 and FIG. 3.

It may be further noted that the image processing apparatus 102 may belocally present around the image processing apparatus 102 or may beintegrated with the image processing apparatus 102. However, thedisclosure may not be so limited and the image processing apparatus 102may be implemented remotely at a cloud media server, such as the server104, without a deviation from the scope of disclosure.

FIG. 2 is a block diagram that illustrates an exemplary image processingapparatus for detection of moving objects in a sequence of frames, inaccordance with an embodiment of the disclosure. FIG. 2 is explained inconjunction with elements from FIG. 1. With reference to FIG. 2, thereis shown a block diagram of the image processing apparatus 102. Theimage processing apparatus 102 may include a circuitry 200, a networkinterface 202, a memory 206, and an Input/output (I/O) device 208. Thecircuitry 200 may further include a processor 204, an optical flowgenerator 210, a background detector 212, and an object detector 214.Similarly, the I/O device 208 may include the image-capture device 102Aand a display screen 208A. In some cases, the image-capture device 102Amay be communicatively coupled to the image processing apparatus 102,via the communication network 106. The circuitry 200 may becommunicatively coupled with the network interface 202, the memory 206,the I/O device 208, via a set of communication ports/channels.

The network interface 202 may comprise suitable logic, circuitry, andinterfaces that may be configured to establish communication between theimage processing apparatus 102 and the server 104, via the communicationnetwork 106. The network interface 202 may be implemented by use ofvarious known technologies to support wired or wireless communication ofthe image processing apparatus 102 with the communication network 106.The network interface 202 may include, but is not limited to, anantenna, a radio frequency (RF) transceiver, one or more amplifiers, atuner, one or more oscillators, a digital signal processor, acoder-decoder (CODEC) chipset, a subscriber identity module (SIM) card,and a local buffer.

The network interface 202 may communicate via wireless communicationwith networks, such as the Internet, an Intranet and a wireless network,such as a cellular telephone network, a wireless local area network(LAN) and a metropolitan area network (MAN). The wireless communicationmay use any of a plurality of communication standards, protocols andtechnologies, such as Global System for Mobile Communications (GSM),Enhanced Data GSM Environment (EDGE), wideband code division multipleaccess (W-CDMA), Long Term Evolution (LTE), code division multipleaccess (CDMA), time division multiple access (TDMA), Bluetooth, WirelessFidelity (Wi-Fi) (such as IEEE 802.11a, IEEE 802.11b, IEEE 802.11g andIEEE 802.11n), voice over Internet Protocol (VoIP), light fidelity(Li-Fi), Wi-MAX, a protocol for email, instant messaging, and ShortMessage Service (SMS).

The processor 204 may comprise suitable logic, circuitry, and interfacesthat may be configured to execute a set of instructions stored in thememory 206. The processor 204 may be implemented based on a number ofprocessor technologies known in the art. Examples of the processor 204may include, but are not limited to, a Graphical Processing Unit (GPU),a Central Processing Unit (CPU), an x86-based processor, an x64-basedprocessor, a Reduced Instruction Set Computing (RISC) processor, anApplication-Specific Integrated Circuit (ASIC) processor, a ComplexInstruction Set Computing (CISC) processor.

The memory 206 may comprise suitable logic, circuitry, and interfacesthat may be configured to store a set of instructions executable by theprocessor. The memory 206 may be configured to store data associatedwith operating systems and associated applications. The memory 206 maybe further store instructions and control signal data that may beutilized to detect different objects engaged in motion in successiveframes of the captured plurality of frames. Examples of implementationof the memory 206 may include, but are not limited to, Random AccessMemory (RAM), Read Only Memory (ROM), Electrically Erasable ProgrammableRead-Only Memory (EEPROM), Hard Disk Drive (HDD), a Solid-State Drive(SSD), a CPU cache, or a Secure Digital (SD) card.

The I/O device 208 may comprise suitable logic, circuitry, andinterfaces that may be configured to provide an I/O channel/interfacebetween a user and the different operational components of the imageprocessing apparatus 102. The I/O device 208 may receive an input from auser and present an output based on the provided input from the user.The I/O device 208 may include various input and output ports to connectvarious other I/O devices that may communicate with differentoperational components of the image processing apparatus 102. Examplesof the input device may include, but are not limited to, a touch screen,a keyboard, a mouse, a joystick, a microphone, and other image-capturedevices. Examples of the output device may include, but are not limitedto, a display (for example, the display screen 208A), a speaker, and ahaptic or other sensory output device.

The display screen 208A may comprise suitable logic, circuitry,interfaces that may be configured to display the captured plurality offrames and the detected moving objects in captured plurality of frames.The display screen 208A may be realized through several knowntechnologies such as, but not limited to, at least one of a LiquidCrystal Display (LCD) display, a Light Emitting Diode (LED) display, aplasma display, and an Organic LED (OLED) display technology, and otherdisplay. In accordance with an embodiment, the display screen 208A mayrefer to a display screen of smart-glass device, a see-through display,a projection-based display, an electro-chromic display, and atransparent display. The see-through display may be a transparent or asemi-transparent display. In accordance with an embodiment, thesee-through display or the projection-based display may generate anoptical illusion that the segmented object is floating in air at apre-determined distance from a user's eye and, thereby provide anenhanced user experience.

The optical flow generator 210 may comprise suitable logic, circuitry,and interfaces that may be configured to derive a first optical flow mapbased on motion information of a first frame and a second frame in acaptured plurality of frames. The optical flow generator 210 may befurther configured to derive a second optical flow map based on motioninformation of a third frame and a second frame in a plurality offrames. Also, the optical flow generator 210 may be further configuredto derive a plurality of sensor-based optical flow maps from theplurality of frames. In some embodiments, the optical flow generator 210may be implemented as a single hardware processor, a cluster ofprocessor, or a specialized hardware circuitry (as shown) in the imageprocessing apparatus 102. For example, the optical flow generator 210may be implemented based on one of an x86-based processor, a RISCprocessor, a field programmable gate array (FPGA), an ASIC processor, aprogrammable logic ASIC (PL-ASIC), a CISC processor, and other hardwareprocessors. In other embodiments, the optical flow generator 210 may beimplemented as programs/instructions executable at the processor thatmay execute the functions of the optical flow generator 210 in asequential or parallelized executional paradigm.

The background detector 212 may comprise suitable logic, circuitry, andinterfaces that may be configured to identify a background region indifferent frames of the captured plurality of frames. Such backgroundregion may be a static background region or may encounter somemodification in successive frames. In some embodiments, the backgrounddetector 212 may be implemented as a single hardware processor, acluster of processor, or a specialized hardware circuitry (as shown) inthe image processing apparatus 102. For example, the background detector212 may be implemented based on one of an x86-based processor, a RISCprocessor, a field programmable gate array (FPGA), an ASIC processor, aprogrammable logic ASIC (PL-ASIC), a CISC processor, and other hardwareprocessors. In other embodiments, the background detector 212 may beimplemented as programs/instructions executable at the processor 204that may execute the functions of the background detector 212 in asequential or parallelized executional paradigm.

The object detector 214 may comprise suitable logic, circuitry, andinterfaces that may be configured to detect the moving objects based oncombination of a plurality of pixels in a warped first foreground region(from first and second frame) and a plurality of pixels in a secondforeground region for the moving objects (from second and third frame).In some embodiments, the object detector 214 may be implemented as asingle hardware processor, a cluster of processor, or a specializedhardware circuitry (as shown) in the image processing apparatus 102. Forexample, the object detector 214 may be implemented based on one of anx86-based processor, a RISC processor, a field programmable gate array(FPGA), an ASIC processor, a programmable logic ASIC (PL-ASIC), a CISCprocessor, and other hardware processors. In other embodiments, theobject detector 214 may be implemented as programs/instructionsexecutable at the processor 204 that may execute the functions of theobject detector 214 in a sequential or parallelized executionalparadigm.

In operation, the image-capture device 102A, may capture a plurality offrames in a sequence, of a scene (for example, scene 108) in FOV of theimage-capture device 102A. The plurality of frames may be captured basedon an input provided by a user, via a selection of a graphical buttonrendered at the display screen 208A, gesture-based inputs, voice-basedinputs, or a button-press event directly from the image processingapparatus 102. Alternatively, the image processing apparatus 102 mayretrieve the plurality of frames as pre-stored image data from theserver 104. The plurality of frames may be a part of a video, and orselectively captured frames at closely spaced intervals, for example,from 30 milliseconds to 500 second. Such plurality of frames may includeat least a first frame, a second frame, and a third frame in a sequence.

The processor 204 may be configured to derive, using the optical flowgenerator 210, a first optical flow map based on motion information inthe first frame and the second frame of the captured plurality offrames. The first optical flow map may be generated based on adifference in values of pixel locations of the plurality of pixels inthe second frame and the first frame. Accordingly, the processor 204 maybe configured to, derive, using the optical flow generator 210, a secondoptical flow map based on motion information of the second frame and thethird frame of the plurality of frames captured by the image-capturedevice 102A. The second optical flow map may be generated based on adifference of pixel location of the plurality of pixels in the secondframe and the third frame.

The processor 204 may be configured to implement, using the backgrounddetector 212, various techniques to compute a plurality of first motionvector values for a plurality of pixels in the second frame with respectto the first frame. The plurality of first motion vector values may becomputed based on the first optical flow map generated by the opticalflow generator 210. The processor 204 may be configured to implement,using the background detector 212, various techniques to compute aplurality of sensor-based motion vector values for the plurality ofpixels in the second frame based on an input (such as the angularvelocity information) received from a motion sensor in the image-capturedevice 102A or the image processing apparatus 102 (not shown). Theprocessor 204 may be configured to generate, using the backgrounddetector 212, a background model for the plurality of the frames basedon the first optical flow map and the plurality of sensor-based opticalflow maps generated by the processor 204, in conjunction with theoptical flow generator 210. Such background model may be utilized toidentify the background region in each frame based on the computedplurality of first motion vector values and the computed plurality ofsensor based motion vector values.

The plurality of first motion vector values may correspond to a relativemovement of a plurality of pixels from the first frame to the secondframe. Such computation of the relative movement of the plurality ofpixels from one frame to a subsequent frame may be done based on varioustechniques that may be known to one skilled in the art. Examples of suchtechniques may include, but are not limited to, a sum of absolutedifference (SAD) technique, a sum of squared difference (SSD) technique,a weighted sum of absolute difference (WSAD) technique, and a weightedsum of squared difference (WSSD) technique. Other techniques known inthe art may also be implemented to compute the relative movement of theplurality of pixels, without a deviation from the scope of thedisclosure.

In accordance with an embodiment, the processor 204 may be furtherconfigured to compute using the background detector 212, a histogramdistribution for a plurality of pixels (a specific region) in each framefrom the plurality of frames. The histogram distribution may begenerated to further analyze the plurality of pixels in each frame basedon a threshold that may be set for the histogram distribution. Theprocessor 204 may be further configured to assign, using the backgrounddetector 212, a probability value to each pixel that may be mapped to asample in the histogram distribution based on whether the pixel belongsto the foreground region or the background region, in accordance with aspecific probability distribution. For example, a probability value maybe assigned based on a comparison of a pixel value with the setthreshold value. Further, the processor 204 may be configured to utilizethe histogram distribution to classify, using the background detector212, specific regions (of pixels) in each frame as background pixels orforeground pixels. Thus, the plurality of pixels may be furtherclassified as the background region or the foreground region.Thereafter, the processor 204 may be further configured to segregate,using the background detector 212, the detected background region anddifferent foreground regions in the plurality of frames.

The processor 204 may be further configured to identify using the objectdetector 214, a first foreground region that corresponds to the movingobjects across the first frame and the second frame based on thedetected background region. The identification of the first foregroundregion may be done after the detected background region may be removedfrom first frame and the second frame. The first foreground regions maybe prominently identified when a difference between the second frame andthe first frame may specifically exhibit intensity values for pixellocations that may change in the first frame and the second frame. Theaforementioned technique for detection of the first foreground regionmay exhibit efficiency in cases where all foreground pixels may move andall background pixels may remain fixed or static.

The processor 204 may be further configured to apply using thebackground detector 212, a threshold intensity value for a differenceimage obtained based on a difference between the second frame and thefirst frame to improve the subtraction of the background region from thefirst frame and the second frame. Pixels intensities that may correspondto a relative movement of the plurality of pixels from the first frameto the second frame may be filtered on the basis of value of thethreshold and foreground detection with faster movements may requirehigher thresholds and slower movements may require lower thresholds.

The processor 204 may be further configured to warp, using the objectdetector 214, the identified first foreground region associated with themoving objects across the second frame based on the derived firstoptical flow map The warping may be done to precisely determine aposition of the first foreground region. In some cases, the optical flowmap-based identification of the first foreground region may includenoises and imprecise results (for example, excess region marked as thedetected foreground region). Therefore, to compensate for the noise inthe identification of the first foreground region, a plurality of pixelsdisplaced in successive frames may be mapped based on the warping. Inaccordance with an embodiment, the processor 204 may be configured toexecute, using the object detector 214, warping of the first foregroundregion that may define a plurality of first points on the first frameand a plurality of second points (as pixel positions) in the secondframe. A mathematical model may be utilized to execute a geometricalmapping that may relate the plurality of first points in the first frameprecisely to the plurality of second points in the second frame of theplurality of frames. Additionally, the geometrical mapping may befurther utilized to verify whether all pixel positions in the secondframe map back to pixel positions in the first canonical frame. Thefirst foreground region may be warped by applying first optical flow tothe first foreground region identified based on the first optical flowmap.

The processor 204 may be configured to identify, using the objectdetector 214, the second foreground region corresponding to the at leastone moving object across the second frame and the third frame based onthe derived second optical flow map. The second optical flow map may begenerated based on a difference of pixel location of the plurality ofpixels in the second frame and the third frame. The plurality of secondmotion vector values may correspond to a relative movement of each ofthe plurality of pixels from the second image frame to the third frame.

In accordance with an embodiment, the object detector 204 may be furtherconfigured to determine a final foreground region based on combinationof a plurality of pixels in the warped first foreground region and acorresponding plurality of pixels in the second foreground region. Thedetermined final foreground region may correspond to the at least onemoving object. The object detector 214 may be further configured todetect the moving object/foreground region based on combination of theplurality of pixels in the warped first foreground region and thecorresponding plurality of pixels in the second foreground region. Theobject detector 214 may be further configured to detect the movingobject in the determined final foreground region corresponding to the atleast one moving object in the third frame. An overlapping part oncombination of a plurality of pixels in the warped first foregroundregion (i.e., warped image of the verified first foreground region) anda corresponding plurality of pixels in a second foreground region maylead to detection of the actual object.

The processor 204 may be further configured to apply, using the objectdetector 214, morphological filtration to the final foreground regiondetected in the plurality of frames. Dilation and erosion may be appliedto the determined final foreground region to determine an accuratecontour of the foreground region that may facilitate detection of themoving objects, advantageously in a computationally efficient manner.Morphological filtering may further remove unwanted pixels from thebackground region and improve a quality of the detected moving object.In an exemplary scenario, the morphological filtering may facilitateremoval of isolated pixels (erosion) that may correspond to theforeground region and merge (dilation) nearby disconnected pixels thatmay correspond to the foreground region. After the application ofmorphological filtration, an accurate contour of the second foregroundregion may be obtained.

The processor 204 may be further configured to label, using the objectdetector 214, a foreground region that may correspond to the detectedmoving objects in the plurality of frames. The detected moving objectsmay be labelled with different unique identifiers, such as alphanumericnumbers, color codes, and tags. The background region and the finalforeground region detected based on the first optical flow map and thesecond optical flow map may be represented as a binary image. Theprocessor 204 may be configured to apply, using the object detector 214,a label to each group of connected pixels of the detected finalforeground region in the binary image.

The operations performed by the image processing apparatus 102, asdescribed in FIG. 1, may be performed by the processor 204 inconjunction with the optical flow generator 210, the background detector212, and the object detector 214. Other operations performed by theprocessor 204 in conjunction with the optical flow generator 210, thebackground detector 212, and the object detector 214 are furtherdescribed, for example, in the FIG. 3.

FIG. 3 illustrates an exemplary scenario and operations for detection ofmoving objects in a sequence of frames by the image processing apparatus102 of FIG. 2, in accordance with an embodiment of the disclosure. FIG.3 is explained in conjunction with elements from FIG. 1 and FIG. 2. Withreference to FIG. 3, there is shown a plurality of different stages ofoperations that may be executed at the image processing apparatus 102,in conjunction with the image-capture device 102A. The plurality ofdifferent stages may include a first stage, in which captured frames 302may be retrieved and stored in the memory 206 of the image processingapparatus 102. The captured frames 302 may include a first frame 304A, asecond frame 304B, and a third frame 304C of a scene captured atdifferent time intervals or a video captured by the image-capture device102A, while the background remains still.

The first frame 304A may include an object 306A and an object 306B thatmay be undetected at the first stage and the second frame 304B mayinclude the object 306A and the object 306B at positions different fromrespective positions in the first frame 304A. Similarly, the third frame304C may include the object 306A and the object 306B at positions thatmay be different from the respective positions in the first frame 304Aand the second frame 304B. The processor 204 may be configured to deriveusing the optical flow generator 210, a first optical flow map based onmotion information of the first frame 304A and the second frame 304B ofthe captured frames 302. The plurality of first motion vector values maycorrespond to a relative movement of each of the plurality of pixelsfrom the first frame 304A to the second frame 304B in the capturedframes 302. Accordingly, the first optical flow map may be generatedbased on computation of an apparent distance and direction ofdisplacement of pixels in the second frame 304B with respect to thefirst frame 304A of the captured frames 302.

The processor 204 may be further configured to derive, using the opticalflow generator 210, a sensor-based optical flow map based on the secondframe 304B and subsequently the third frame 304C from the capturedframes 302. The background region may be detected based on angularvelocity information by a motion sensor, of a plurality of pixels ineach frame of the captured frames 302. Examples of implementation of themotion sensor may include, but are not limited to, a gyro sensor, and anaccelerometer. The background region may be further identified in theeach frame based on the computed plurality of first motion vector valuesand the computed plurality of sensor-based motion vector values.

Alternatively, a histogram may be utilized to classify the plurality ofpixels in each frame as background pixels or foreground pixels. Aprobability distribution may be constructed in the form of a histogramfor the plurality of pixels in each frame of the captured frames 302.The histogram may be generated to analyze the plurality of pixels ineach frame and determine a probability of whether a pixel mapped in thehistogram may be part of either a foreground region or a backgroundregion from a comparison of pixel values to the threshold value.

At the second stage, the processor 204 may be further configured toidentify a first foreground region 308 that may include a region coveredby the object 306A and the object 306B. The first foreground region 308may be identified based on the generated first optical flow map usingthe first frame 304A and the second frame 304B. The processor 204 may befurther configured to identify, using the object detector 214, a secondforeground region 310 that may include the object 306A and the object306B at two different positions in the third frame 304C with respect tothe second frame 304B. The second foreground region 310 may beidentified based on the derived second optical flow map. The secondoptical flow map may be generated based on a difference of location of aplurality of pixels in the second frame 304B and the third frame 304C ofthe captured frames 302. The first foreground region 308 in thegenerated first optical flow map may be larger than the actual movingobject, such as the object 306A and the object 306B, based on the methodused for generation of the first optical flow map. Therefore, the firstoptical flow map and the second optical flow map are utilized toidentify the moving object, such as the object 306A and 306B,accurately.

The processor 204 may be further configured to warp the first foregroundregion 308 associated with the object 306A and the object 306B acrossthe first frame 304A and the second frame 304C, based on the firstoptical flow map. The processor 204 may be further configured to combinea warped first foreground region 312 with the second foreground region310 to determine the final foreground region associated with the movingobject, such as the object 106A and the object 106B. The processor 204may be configured to combine the warped first foreground region 312 withthe second foreground region 310, based on an overlap region thatcorresponds to common pixels of the warped first foreground region 312and the second foreground region 310. The processor 204 may be furtherconfigured to detect an object frame 314, based on the determined finalforeground region. The processor 204 may be configured to detect theobject 306A and the object 306B, based on the detected object frame 314.The detected object frame 314 may include the moving objects, such asthe objects 306A and the object 306B. The detected object fsrame 314 mayindicate the actual position of the moving objects, such as the object306A and the object 306B.

FIG. 4 depicts a flow chart that illustrates an exemplary method formoving object detection in a sequence of frames, in accordance with anembodiment of the disclosure. The flow chart 400 is described inconjunction with elements from FIGS. 1, 2, and 3. With reference to FIG.4, there is shown a flow chart 400. The method in the flow chart 400starts at 402 and proceeds to 404.

At 404, a plurality of frames that may include a first frame, a secondframe, and a third frame may be captured. The image-capture device 102Amay be configured to capture the plurality of frames that may include afirst frame, a second frame, and a third frame, sequentially.

At 406, a first optical flow map may be derived based on motioninformation of a first frame and a second frame. The processor 204 maybe configured to utilize the optical flow generator 210 to derive thefirst optical flow map based on motion information of a first frame anda second frame. A plurality of first motion vector values that may becomputed based on the first optical flow map. The plurality of firstmotion vector values may correspond to a relative movement of each ofthe plurality of pixels from the first frame to the second frame.

At 408, a first foreground region may be identified that may correspondto at least one moving object across the first frame and the secondframe based on the derived first optical flow map. The processor 204 maybe configured to utilize the background detector 212 to identify a firstforeground region that may correspond to at least one moving objectacross the first frame and the second frame based on the derived firstoptical flow map. The identified first foreground region may besegregated from the background region in the first frame and the secondframe. The identification of the background region in the current imageframe may be based on the computed plurality of first motion vectorvalues and the computed plurality of sensor-based motion vector values.

At 410, the identified first foreground region may be warped based onthe first optical flow map. The processor 204 may be configured toutilize the object detector 214 to warp the identified first foregroundregion associated with the moving object across the first and the secondframe, based on the derived first optical flow map.

At 412, a second optical flow map may be derived based on motioninformation of the second frame and the third frame of the capturedplurality of frames. The processor 204 may be configured to utilize theoptical flow generator 210 to derive a second optical flow map, based onmotion information of a second frame and a third frame of a plurality offrames. The second optical flow map may be derived based on a differenceof pixel location of the plurality of pixels in the second frame and thethird frame. The plurality of second motion vector values may correspondto a relative movement of each of the plurality of pixels from thesecond image frame to the third frame.

At 414, the second foreground region that may correspond to the at leastone moving object across the second frame and the third frame may beidentified based on the derived second optical flow map. The processor204 may be configured to utilize the object detector 214 to identify thesecond foreground region that corresponds to the at least one movingobject across the second frame and the third frame based on the derivedsecond optical flow map.

At 416, a final foreground region may be determined based on acombination of a plurality of pixels in the warped first foregroundregion and the corresponding plurality of pixels in the secondforeground region. The final foreground region may be associated with atleast the moving object in the first frame, the second frame, and thethird frame. The processor 204 may be configured to utilize the objectdetector 214 to detect the moving object based on a combination of aplurality of pixels in the warped first foreground region and thecorresponding plurality of pixels in the second foreground region.

At 418, morphological operations may be applied to the determined finalforeground region to reduce noise. The determined final foregroundregion corresponds to the detected moving object, such as the object306A and the object 306B in the plurality of frames. The processor 204may be configured to utilize the object detector 214 to applymorphological filtration to the final foreground region based on thecombination of the plurality of pixels in the warped first foregroundregion and the corresponding plurality of pixels in the secondforeground region in the plurality of frames. The object detector 214may be further configured to detect the moving objects, such as theobject 306A and the object 306B based on the final foreground region.

At 420, the detected moving objects may be labelled in the determinedfinal foreground region in the plurality of frames. The processor 204may be configured to utilize the object detector 214 to label thedetected moving objects, such as the object 306A and the object 306B inthe plurality of frames. The detection and labeling of the object 306Aand the object 306B has been shown and described, for example, in FIG.3. Control passes to end.

The present disclosure may provide several advantages over conventionalobject detection technologies. It may be noted that the more than threeframes may be captured at a time. However, the image processingapparatus may advantageously detect moving objects in maximum of threesuccessive frames from a set of captured frames. Such detection of themoving objects in a maximum of three frames may facilitate an optimalmemory resource utilization and bandwidth usage, and thereby render morecomputational resources for other operations on the image processingdevice, with an insignificant impact on battery power. Such imageprocessing apparatus may be further utilized for low power applications(for example, as a smart band) that may continuously detect movingobjects from a set of three frames. The implementation of pixel-levelwarping techniques may further facilitate precise determination ofcontours or regions occupied by different moving objects in successiveframes. Additionally, the warping process and further combination ofsuccessively identified foreground regions in subsequent frames may beutilized to precisely detect objects engaged in motion.

The disclosed image processing apparatus may be implemented in variousapplication areas, such as video surveillance or tracking for movingobjects, auto-focus in camera applications while an input video iscaptured. The disclosed image processing apparatus and method may besuited for a real-world tracking application, such as video surveillanceof car tracking for autonomous navigation, a gaming system, or otherreal time or near-real time object detection and segmentation for suchmoving objects.

The present disclosure may be realized in hardware, or a combination ofhardware and software. The present disclosure may be realized in acentralized fashion, in at least one computer system, or in adistributed fashion, where different elements may be spread acrossseveral interconnected computer systems. A computer system or otherapparatus adapted to carry out the methods described herein may besuited. A combination of hardware and software may be a general-purposecomputer system with a computer program that, when loaded and executed,may control the computer system such that it carries out the methodsdescribed herein. The present disclosure may be realized in hardwarethat comprises a portion of an integrated circuit that also performsother functions.

The present disclosure may also be embedded in a computer programproduct, which comprises all the features that enable the implementationof the methods described herein, and which when loaded in a computersystem is able to carry out these methods. Computer program, in thepresent context, means any expression, in any language, code ornotation, of a set of instructions intended to cause a system with aninformation processing capability to perform a particular functioneither directly, or after either or both of the following: a) conversionto another language, code or notation; b) reproduction in a differentmaterial form.

While the present disclosure has been described with reference tocertain embodiments, it will be understood by those skilled in the artthat various changes may be made and equivalents may be substitutedwithout departure from the scope of the present disclosure. In addition,many modifications may be made to adapt a particular situation ormaterial to the teachings of the present disclosure without departurefrom its scope. Therefore, it is intended that the present disclosurenot be limited to the particular embodiment disclosed, but that thepresent disclosure will include all embodiments that fall within thescope of the appended claims.

What is claimed is:
 1. An apparatus for detecting moving objects insuccessive frames, the apparatus comprising: a memory; an image-capturedevice; and circuitry communicatively coupled to the memory and theimage-capture device, wherein the circuitry is configured to: derive afirst optical flow map based on motion information of a first frame anda second frame of a plurality of frames captured by the image-capturedevice; derive a second optical flow map based on motion information ofthe second frame and a third frame of the plurality of frames; identifya first foreground region corresponding to at least one moving objectacross the first frame and the second frame based on the derived firstoptical flow map; warp the identified first foreground region associatedwith the at least one moving object across the first frame and thesecond frame based on the derived first optical flow map; identify asecond foreground region corresponding to the at least one moving objectacross the second frame and the third frame based on the derived secondoptical flow map; and detect the at least one moving object based oncombination of a plurality of pixels in the warped first foregroundregion and a corresponding plurality of pixels in the second foregroundregion.
 2. The apparatus according to claim 1, wherein the image-capturedevice is further configured to: capture the plurality of frames in afield-of-view of the image-capture device; and transfer the capturedplurality of frames that comprises the first frame, the second frame andthe third frame to the circuitry.
 3. The apparatus according to claim 1,wherein the circuitry is further configured to detect a backgroundregion in each of the plurality of frames based on a plurality ofoptical flow maps for the plurality of frames and gyro-basedinformation.
 4. The apparatus according to claim 1, wherein the firstoptical flow map comprises a plurality of first motion vector valuesthat corresponds to a motion of the plurality of pixels from the firstframe to the second frame.
 5. The apparatus according to claim 1,wherein the derivation of the first optical flow map is further based ona difference of a pixel position of the plurality of pixels in thesecond frame and the first frame.
 6. The apparatus according to claim 1,wherein the second optical flow map comprises a plurality of secondmotion vector values that corresponds to a motion of the plurality ofpixels from the second frame to the third frame.
 7. The apparatusaccording to claim 1, wherein the derivation of the second optical flowmap is further based on a difference of a pixel position of theplurality of pixels in the third frame and the second frame.
 8. Theapparatus according to claim 1, wherein the identification of a firstforeground region corresponds to at least one moving object across thefirst frame and the second frame based on the derived first optical flowmap.
 9. The apparatus according to claim 1, wherein the identificationof a first foreground region corresponds to at least one moving objectacross the first frame and the second frame based on the derived firstoptical flow map.
 10. The apparatus according to claim 1, wherein theidentification of the first foreground region in the plurality of framesis further based on detection of a background region.
 11. The apparatusaccording to claim 1, wherein the circuitry is further configured todetermine a final foreground region based on a combination of theplurality of pixels in the warped first foreground region and thecorresponding plurality of pixels in the second foreground region. 12.The apparatus according to claim 11, wherein the circuitry is furtherconfigured to apply morphological operations to the final foregroundregion that corresponds to the detected at least one moving object inthe plurality of frames.
 13. The apparatus according to claim 12,wherein the morphological operations comprise morphological dilation andmorphological erosion.
 14. The apparatus according to claim 1, whereinthe circuitry is further configured to label a foreground regioncorresponding to the detected at least one moving object in theplurality of frames.
 15. The apparatus according to claim 1, wherein thedetection of at least one moving object is further based on anoverlapping of the second foreground region for the at least one movingobject over the warped first foreground region for the corresponding atleast one moving object.
 16. A method, comprising: in an apparatuscomprising a memory, an image-capture device and circuitrycommunicatively coupled to the memory and the image-capture device:deriving, by the circuitry, a first optical flow map based on motioninformation of a first frame and a second frame of a plurality of framescaptured by the image-capture device; deriving, by the circuitry, asecond optical flow map based on motion information of the second frameand a third frame of the plurality of frames; identifying, by thecircuitry, a first foreground region corresponding to at least onemoving object across the first frame and the second frame based on thederived first optical flow map; warping, by the circuitry, theidentified first foreground region associated with the at least onemoving object across the first frame and the second frame based on thederived first optical flow map; identifying, by the circuitry, a secondforeground region corresponding to the at least one moving object acrossthe second frame and the third frame based on the derived second opticalflow map; and detecting, by the circuitry, the at least one movingobject based on combination of a plurality of pixels in the result ofthe warped first foreground region and a corresponding plurality ofpixels in the second foreground region corresponding to the at least onemoving object in the third frame.
 19. A non-transitory computer-readablemedium having stored thereon, computer-executable instructions, whichwhen executed an image processing apparatus, causes the image processingapparatus to perform operations, the operation comprising: deriving afirst optical flow map based on motion information of a first frame anda second frame of a plurality of frames captured by an image-capturedevice; deriving a second optical flow map based on motion informationof the second frame and a third frame of the plurality of frames;identifying a first foreground region corresponding to at least onemoving object across the first frame and the second frame based on thederived first optical flow map; warping the identified first foregroundregion associated with the at least one moving object across the firstframe and the second frame based on the derived first optical flow map;identifying a second foreground region corresponding to the at least onemoving object across the second frame and the third frame based on thederived second optical flow map; and detecting the at least one movingobject based on combination of a plurality of pixels in the warped firstforeground region and a corresponding plurality of pixels in the secondforeground region corresponding to the at least one moving object in thethird frame.